183 research outputs found

    On Graphical Models via Univariate Exponential Family Distributions

    Full text link
    Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.Comment: Journal of Machine Learning Researc

    On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

    Get PDF
    Dysregulated microRNA (miRNA) expression is a well-established feature of human cancer. However, the role of specific miRNAs in determining cancer outcomes remains unclear. Using Level 3 expression data from the Cancer Genome Atlas (TCGA), we identified 61 miRNAs that are associated with overall survival in 469 ovarian cancers profiled by microarray (p<0.01). We also identified 12 miRNAs that are associated with survival when miRNAs were profiled in the same specimens using Next Generation Sequencing (miRNA-Seq) (p<0.01). Surprisingly, only 1 miRNA transcript is associated with ovarian cancer survival in both datasets. Our analyses indicate that this discrepancy is due to the fact that miRNA levels reported by the two platforms correlate poorly, even after correcting for potential issues inherent to signal detection algorithms. Further investigation is warranted

    Comprehensive evaluation of RNA-seq quantification methods for linearity

    Get PDF
    Figure S3. Concordant analysis between rank of estimated quantifications and rank of measured abundance value at gene level (a) and isoform level (b). The fitted value in the y-axis is estimated from model D∼m×A+n×B+ε. Ranks were normalized by the number of quantifications in each plot. (PDF 5950 kb

    Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Get PDF
    Background: Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results: We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (\u3e 98%) 12 bp oligomers appear in vertebrate genomes while \u3c 2% of 19 bp oligomers are present. Other species showed different ranges of \u3e 98% to \u3c 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion: Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues

    GmWRKY16 Enhances Drought and Salt Tolerance Through an ABA-Mediated Pathway in Arabidopsis thaliana

    Get PDF
    The WRKY transcription factors (TFs) are one of the largest families of TFs in plants and play multiple roles in plant development and stress response. In the present study, GmWRKY16 encoding a WRKY transcription factor in soybean was functionally characterized in Arabidopsis. GmWRKY16 is a nuclear protein that contains a highly conserved WRKY domain and a C2H2 zinc-finger structure, and has the characteristics of transcriptional activation ability, presenting a constitutive expression pattern with relative expression levels of over fourfold in the old leaves, flowers, seeds and roots of soybean. The results of quantitative real time polymerase chain reaction (qRT-PCR) showed that GmWRKY16 could be induced by salt, alkali, ABA, drought and PEG-6000. As compared with the control, overexpression of GmWRKY16 in Arabidopsis increased the seed germination rate and root growth of seedlings in transgenic lines under higher concentrations of mannitol, NaCl and ABA. In the meantime, GmWRKY16 transgenic lines showed over 75% survival rate after rehydration and enhanced Arabidopsis tolerance to salt and drought with higher proline and lower MDA accumulation, less water loss of the detached leaves, and accumulated more endogenous ABA than the control under stress conditions. Further studies showed that AtWRKY8, KIN1, and RD29A were induced in GmWRKY16 transgenic plants under NaCl treatment. The expressions of the ABA biosynthesis gene (NCED3), signaling genes (ABI1, ABI2, ABI4, and ABI5), responsive genes (RD29A, COR15A, COR15B, and RD22) and stress-related marker genes (KIN1, LEA14, LEA76, and CER3) were regulated in transgenic lines under drought stress. In summary, these results suggest that GmWRKY16 as a WRKY TF may promote tolerance to drought and salt stresses through an ABA-mediated pathway

    Mechanisms of FUS1/TUSC2 deficiency in mesothelioma and its tumorigenic transcriptional effects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>FUS1/TUSC2 is a novel tumor suppressor located in the critical 3p21.3 chromosomal region frequently deleted in multiple cancers. We previously showed that Tusc2-deficient mice display a complex immuno-inflammatory phenotype with a predisposition to cancer. The goal of this study was to analyze possible involvement of TUSC2 in malignant pleural mesothelioma (MPM) - an aggressive inflammatory cancer associated with exposure to asbestos.</p> <p>Methods</p> <p>TUSC2 insufficiency in clinical specimens of MPM was assessed via RT-PCR (mRNA level), Representational Oligonucleotide Microarray Analysis (DNA level), and immunohistochemical evaluation (protein level). A possible link between TUSC2 expression and exposure to asbestos was studied using asbestos-treated mesothelial cells and ROS (reactive oxygen species) scavengers. Transcripional effects of TUSC2 in MPM were assessed through expression array analysis of TUSC2-transfected MPM cells.</p> <p>Results</p> <p>Expression of TUSC2 was downregulated in ~84% of MM specimens while loss of TUSC2-containing 3p21.3 region observed in ~36% of MPMs including stage 1 tumors. Exposure to asbestos led to a transcriptional suppression of TUSC2, which we found to be ROS-dependent. Expression array studies showed that TUSC2 activates transcription of multiple genes with tumor suppressor properties and down-regulates pro-tumorigenic genes, thus supporting its role as a tumor suppressor. In agreement with our knockout model, TUSC2 up-regulated IL-15 and also modulated more than 40 other genes (~20% of total TUSC2-affected genes) associated with immune system. Among these genes, we identified CD24 and CD274, key immunoreceptors that regulate immunogenic T and B cells and play important roles in systemic autoimmune diseases. Finally, clinical significance of TUSC2 transcriptional effects was validated on the expression array data produced previously on clinical specimens of MPM. In this analysis, 42 TUSC2 targets proved to be concordantly modulated in MM serving as disease discriminators.</p> <p>Conclusion</p> <p>Our data support immuno-therapeutic potential of TUSC2, define its targets, and underscore its importance as a transcriptional stimulator of anti-tumorigenic pathways.</p

    Quantitative evaluation of reservoir quality of tight oil sandstones in chang 7 member of Ordos Basin

    Get PDF
    In order to establish a quantitative evaluation system for reservoir quality suitable for tight oil sandstones, in this study, taking the Chang 7 Member in the Maling area of the Ordos Basin as an example, the nuclear magnetic resonance, clay mineral analysis, high pressure mercury injection analysis and logging interpretation technology have been used to carry out a comprehensive evaluation of the pore structures, sand body structures and oil-bearing properties of tight oil sandstone reservoirs. The results show that the pseudo-capillary pressure curves transformed by the NMR T2 spectra are consistent with the capillary pressure curves measured by the core experiments. This method can be used for accurate characterization of the pore structures of the reservoir. The pore structure parameters calculated based on the pseudo-capillary pressure curves can accurately reflect the pore structures of the reservoirs such as micropores-thin throats and complex tortuosity. At the same time, the smoothness feature of conventional logging curves is used to evaluate the sand body structures and heterogeneity of the reservoir, and the apparent energy storage coefficient is introduced to quantitatively evaluate the oil-bearing properties of tight oil reservoirs. The evaluation results are in good agreement with the actual production situation. The larger the apparent energy storage coefficient, the higher the initial output of the oil wells. The evaluation results of the reservoir quality of the tight oil sandstones constructed in this paper are highly consistent with the production status, so the method has broad application prospects
    • …
    corecore